Grid Workflow Software for a High-Throughput Proteome Annotation Pipeline

نویسندگان

  • Adam Birnbaum
  • Jim Hayes
  • Wilfred W. Li
  • Mark A. Miller
  • Peter W. Arzberger
  • Philip E. Bourne
  • Henri Casanova
چکیده

The goal of the Encyclopedia of Life (EOL) Project is to predict structural information for all proteins, in all organisms. This calculation presents challenges both in terms of the scale of the computational resources required (approximately 1.8 million CPU hours), as well as in data and workflow management. While tools are available that solve some subsets of these problems, it was necessary for us to build software to integrate and manage the overall Grid application execution. In this paper, we present this workflow system, detail its components, and report on the performance of our initial prototype implementation for runs over a large-scale Grid platform during the SC’03 conference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grid Portal Interface for Interactive Use and Monitoring of High-Throughput Proteome Annotation

High-throughput proteome annotation refers to the activity of extracting information from all proteins in a particular organism using bioinformatics software on a high performance computing platform such as the grid. The Encyclopedia of Life (EOL) project [1] aims to catalog all proteins in all species for public benefits using an Integrative Genome Annotation Pipeline [2] (iGAP). The intrinsic...

متن کامل

Bioinformatics for plant genome annotation

High throughput sequencing must be matched by high throughput annotation. Given the large number of annotation tools available, a multitude of interdependent analyses are required for an in-depth annotation of even a single BAC sequence. Special annotation pipeline software is required to make such annotation processes feasible in an automated fashion. In terms of functionality, such software s...

متن کامل

Simple high-throughput annotation pipeline (SHAP)

SUMMARY SHAP (simple high-throughput annotation pipeline) is a lightweight and scalable sequence annotation pipeline capable of supporting research efforts that generate or utilize large volumes of DNA sequence data. The software provides Grid capable analysis, relational storage and Web-based full-text searching of annotation results. Implemented in Java, SHAP recognizes the limited resources ...

متن کامل

An integrated pipeline for protein classification using specific PSSMs and existing protein annotations

Protein classification has been performed by many protein databases to infer annotations of unknown proteins and therefore enhance the performance of protein annotation. In this study, we implemented an integrated pipeline for protein classification using specific PSSMs and proteins with the same entity name. After clustering sequences on the basis of their evolutionary distances, a target grou...

متن کامل

Mapping of Scientific Workflow within the e-Protein project to Distributed Resources

The e-Protein project, a BBSRC pilot project, aims to examine the issues in building a structure-based annotation of the proteins in the major genomes by linking resources (computing, software and databases) at three sites using Grid technologies. This paper describes the implementation of the Imperial College annotation pipeline (3D-GENOMICS) within ICENI. The scientific problem of large-scale...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004